Multi-Armed Bandits for Spectrum Allocation in Multi-Agent Channel Bonding WLANs
نویسندگان
چکیده
While dynamic channel bonding (DCB) is proven to boost the capacity of wireless local area networks (WLANs) by adapting bandwidth on a per-frame basis, its performance tied primary and secondary selection. Unfortunately, in uncoordinated high-density deployments where multiple basic service sets (BSSs) may potentially overlap, hand-crafted spectrum management techniques perform poorly given complex hidden/exposed nodes interactions. To cope with such challenging Wi-Fi environments, this paper, we first identify machine learning (ML) approaches applicable problem at hand justify why model-free RL suits it most.We then design complete framework call into question whether use algorithms helps quest for rapid realistic scenarios. Through extensive simulations, derive that stateless form lightweight multi-armed-bandits (MABs) an efficient solution adaptation avoiding definition broad and/or meaningless states. In contrast most current trends, envision MABs as appropriate alternative cumbersome slowly convergent methods Q-learning, especially, deep reinforcement learning.
منابع مشابه
Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm and hence collect a reward, or to broadcast the reward it obtained in the previous epoch to the team a...
متن کاملContextual Multi-Armed Bandits
We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions ...
متن کاملStaged Multi-armed Bandits
In conventional multi-armed bandits (MAB) and other reinforcement learning methods, the learner sequentially chooses actions and obtains a reward (which can be possibly missing, delayed or erroneous) after each taken action. This reward is then used by the learner to improve its future decisions. However, in numerous applications, ranging from personalized patient treatment to personalized web-...
متن کاملMortal Multi-Armed Bandits
We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified ...
متن کاملRegional Multi-Armed Bandits
We consider a variant of the classic multiarmed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, information of other arms in the same group is also revealed. This regional bandit model naturally bridges the non...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3114430